An evolutionary factor analysis computation for mining website structures

نویسندگان

  • M. Rocío Martínez-Torres
  • Sergio L. Toral Marín
  • Beatriz Palacios
  • Federico Barrero
چکیده

This paper explores website link structure considering websites as interconnected graphs and analyzing their features as a social network. Two networks have been extracted for representing websites: a domain network containing subdomains or external domains linked through the website and a page network containing webpages browsed from the root domain. Factor analysis provides the statistical methodology to adequately extract the main website profiles in terms of their internal structure. However, due to the large number of indicators, the task of selecting a representative subset of indicators becomes unaffordable. A genetic search of an optimum subset of indicators is proposed in this paper, selecting a multiobjective fitness function based on factor analysis results. The optimum solution provides a coherent and relevant categorization of website profiles, and highlights the possibilities of genetic algorithms as a tool for discovering new knowledge in the field of web mining. 2012 Elsevier Ltd. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Designing a System for Trend Analysis of Users in Website Surfing in Iran Using Data Mining and Text Mining Algorithms

Background and Aim: As of the entrance of web surfing to the lifestyle of a vast majority of people in the society and the need for a more accurate social and cultural policy making in the field, authors intended to analyze the behavior of the society users in viewing different websites so as to help politicians and practitioners. Methods: Design science research method is used in this research...

متن کامل

بررسی نقش عوامل مؤثر بر فراوانی حوادث در لوله‌های اصلی آب رسانی ‌با استفاده از الگوی رگرسیونی ترکیبی

A water distribution network is one of the important parts of infrastructure systems. The efficient management and proactive planning of capital investment of these assets are fundamental for efficient and effective service delivered by water companies. The direct economic costs (i.e. rehabilitation investment, repair costs, water loss, etc.) as well as indirect costs (i.e. service and traffic ...

متن کامل

An Evolutionary Data Clustering Algorithm

Data mining is the process of deriving knowledge from data. The data clustering is a classical activity in data mining. Clustering is the process of grouping objects together in such a way that the objects belonging to the same group are similar and those belonging to different groups are dissimilar. In this paper we propose a method to carry out data clustering using Evolutionary Computation. ...

متن کامل

A Technique for Improving Web Mining using Enhanced Genetic Algorithm

World Wide Web is growing at a very fast pace and makes a lot of information available to the public. Search engines used conventional methods to retrieve information on the Web; however, the search results of these engines are still able to be refined and their accuracy is not high enough. One of the methods for web mining is evolutionary algorithms which search according to the user interests...

متن کامل

Mining Conserved Topological Structures from Large Protein-Protein Interaction Networks

Analysis of Protein-Protein Interaction (PPI) networks is of great significance in evolutionary biology. Because of high computation cost, recently multi-PPI network alignment becomes hot topic. In this paper, we proposed conserved topological structures mining based multiPPI network alignment technology. The most challenging problems in conserved topological structure mining are the large size...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Expert Syst. Appl.

دوره 39  شماره 

صفحات  -

تاریخ انتشار 2012